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EFFICIENT CALCULATION OF RISK MEASURES BY 
IMPORTANCE SAMPLING - THE HEAVY TAILED CASE 

HENRIK HULT AND JENS SVENSSON 



Abstract. Computation of extreme quantiles and tail-based risk measures 
using standard Monte Carlo simulation can be inefficient. A method to speed 
up computations is provided by importance sampling. We show that impor- 
tance sampling algorithms, designed for efficient tail probability estimation, 
can significantly improve Monte Carlo estimators of tail-based risk measures. 
In the heavy-tailed setting, when the random variable of interest has a regu- 
larly varying distribution, we provide sufficient conditions for the asymptotic 
relative error of importance sampling estimators of risk measures, such as 
Value-at-Risk and expected shortfall, to be small. The results are illustrated 
by some numerical examples. 



1. Introduction 

Risk measures are frequently used to quantify uncertainty in a financial or ac- 
tuarial context. Many risk measures, such as Value-at-Risk and expected shortfall, 
depend on the tail of the loss distribution. Exact formulas for computing risk mea- 
sures are only available for simple models and an alternative is to use Monte Carlo 
simulation. However, standard Monte Carlo can be inefficient when the function of 
interest depends on the occurrence of rare events. A large number of samples may 
be needed for accurate computation of extreme risk measures with standard Monte 
Carlo, resulting in high computational cost. An alternative to reduce the compu- 
tational cost without loss of accuracy is provided by importance sampling. There 
is a vast literature on the design of importance sampling algorithms for computing 
rare event probabilities. However, for computing quantiles a nd other risk measures 
the literature is not as developed. Idasserman et all (|2002h propose a method for 
efficient computation of quantiles of a heavy-tailed portfolio. Their method is based 
on efficient algorithms designed for rare event probabilities. The exceedance prob- 
ability is computed for a suggested initial value of the quantilc. Then the quantile 
estimate is updated depending on the computed probability and a search algorithm 
is constructed to generate subsequent and more a c curat e quantile estimates. We 
follow a more direct approach suggested by iGlvnrJ (|l996l ) for computing quantiles 
and extend it to handle expected shortfall. 



Let us first give a brief description of the problem. Let X be a random vari- 
able with distribution distribution function (d.f.) F, and continuous density /. 
Consider the problem of computing its pth quantile, i.e. a number X p such that 
P(X > X p ) > 1 — p, for some p <G (0, 1). If possible the quantile is calculated by 
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inverting the d.f. 

X p = F-{p) := inf{x : 1 - F(x) > 1 -p). 

When this is impossible an alternative is to use simulation. Computation of F*~(p) 
by standard Monte Carlo can be implemented as follows. Generate N independent 
copies of X, denoted X\, . . . ,Xpj. The empirical distribution function (e.d.f.) of 
the sample is given by 

1 N 

F N (x) = -^2l{X i <x} 

i=\ 

and the quantile estimate is given by F^(p). For extreme quantiles, when 1 — p is 
very small, standard Monte Carlo can be inefficient. Since only a small fraction of 
the sample will be located in the tail, large samples are needed to obtain reliable 
estimates. A rough approach for quantifying the efficiency of Monte Carlo estimates 
is to first consider a central limit theorem for F£j~(p). Suppose (this is true under 
suitable conditions) 

where A denotes weak convergence. It is desirable to have the asymptotic standard 
deviation of roughly the same size as the quantity F*~(p) we are estimating. For 
standard Monte Carlo the asymptotic standard deviation is 

VpQ~p) v^(*^(p)) 

/(F<-(p))~ f(F^(p)) 

which is typically much larger than F*~{jp) for p close to 1. 

Consider, as an alternative to standard Monte Carlo, the method of importance 
sampling. Then the sample X±, . . . , Xn is generated from the sampling distribution 
v and the importance sampling tail e.d.f. is given by 

The quantile estimate is then given by (1 — F v ^)^{p) = infjee : F„^(x) < 1 — p}- 
The goal is to choose v to get many samples in the tail of the original distribution 
and with small Radon-Nikodym weights. Again, a rough evaluation of the perfor- 
mance may be done by studying the limiting variance in a central limit theorem 
of the form 

VN((1 - F„, N r(p) - F^(p)) A n(0, o*) . 

It turns out that the asymptotic properties (as p — * 1) of is closely related to 
the asymptotics of the second moment of importance sampling algorithms designed 
for computing rare event probabilities. This indicates that efficient algorithms for 
computing rare event probabilities are indeed useful for computing quantiles. 

We use a similar approach to evaluate the performance of importance sampling 
algorithms for computing expected shortfall. For a random variable X with d.f. F 
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expected shortfall at level p G (0, 1) can be expressed as 



Expected shortfail p (X) 



1-p 



F^{u)du=: lp {F^). 



The standard Monte Carlo estimate based on a sample X±, . . . , Xn is given by 
TpC-Fjv 7 ) whereas the importance sampling estimate is given by 7 P ((1 — F Vj n)*~). 
A central limit theorem is derived for the expected shortfall estimate based on 
importance sampling and the properties of the limiting variance are studied as 
p-1. 

When evaluating the asymptotic variance for p close to 1 we restrict attention to 
the heavy-tailed case. More precisely, it is assumed that the original distribution 
has a regularly varying tail. This is motivated by applications, e.g. finance and 
insurance, where heavy-tailed data are frequently observed and evaluation of risk 
measures is important for risk control. 

For computation of rare event tail probabilities, the importance sampling esti- 
mate of P(X > A) is given by p\ = F V: n(X). Typically, the performance of a rare 
event simulation algorithm is evaluated in terms of the relative error, 



Relative Error : 



y/Var(pA) 
P\ 



An algorithm is said to be asymptotically optimal if the relative error tends to 
as A — > co. If the relative error remains bounded as A — > oo, the algorithm is 
said to have bounded relative error. In the heavy-tailed setting there exist sev- 
eral al gorithms for the case wh ere X is given by the value at time n of a random 



walk. iBassamboo et al.l (|20071 ) show that for such algorithms a necessary condi- 



tion for them to ac hieve asymptotic optimality is that they are state-dependent. 
Dupuis et al.1 ( 20071 ) develop the first such algorithm, which almost achieves as - 
ymptotic optimality for regularly varying distributions. iBlanchet and Glvnnl (2008) 
propose a state-dependent algorithm w ith bounded relative err or for a more gen- 
eral class of heavy-tailed distributions. IBlanchet and Liul (|2008f ) consider the case 
where the number of steps of the random walk and A tends to infinity simultane- 



ously and develop an algorithm with bounde d relative error. iHult and Svensson 



(2009) consider algorithms of the same kind as iDupuis et al.l (|2007f ) . and show that 
they can be made asymptotically optimal. 

The paper is organized as follows. In Section 2 we review some standard results 
from empirical process theory. In Section 3 we derive central limit theorems for em- 
pirical quantiles for the empirical measures obtained from an importance sampling 
algorithm. In Section 4 we consider computation of risk measures for heavy-tailed 
(regularly varying) random variables. Sufficient conditions for importance sampling 
algorithms designed for rare event probability estimation to have small asymptotic 
variance are provided. Finally, in Section 5 the procedure is illustrated for com- 
putation of risk measures when the variable of interest is the position of a finite 
random walk with regularly varying steps. 



2. Empirical processes 
In this section we review some basic results from the theory of empirical pro- 



cesses. We refer to 



see also ICsorgo et al.1 (|1986l )) 



der Vaart and Wellnerl (jl996h for a thorough introduction 
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Let {Xi}^ zl be independent identically distributed random variables with distri- 
bution /i. Denote by fiN the empirical measure based on the first N observations; 

1 N 

i=l 

where 5 X is the Dirac measure at x. For a collection £F of real valued measurable 
functions, the empirical measure induces a map from J — * R by / i— * /ijv(/) = 
J fd/iN- Assuming sup^ g3 r \f(x) — fJ.(f)\ < oo for each x, the empirical process £at, 
given by 



^(/) = ViV( M2 v(/)-M/)) 5 
can be viewed as a map into l 00 ^); the space of bounded functions — > R equipped 
with the uniform metric. The collection J is called /x-Donsker if 

in Z°°(J) 

and the limit is a tight Borel measurable element in Z o0 (9 r ). 

The classical result by Donsker states that the collection of indicator functions 
x i— > /{a; < is /x-D onsker with the limiting process B o /i, where B is a Brownian 
bridge on [0, 1] (see Ivan der Vaart and Wellner (|l996l ). pp. 81-82). In this paper 
we will be particularly concerned with the collection of indicator functions 
x i ► /{a < x < i} for — oo < a < t < b < oo, which also is /i-Donsker for any 
probability distribution \i. To simplify notation we will often write £yv £ in 
l°°[a,b] for £iv ^£ in J°°(?a,6). 

To obtain convergence results for mappings of the empirical process it is useful 
to apply the functional delta-method. Let Ei and E2 be two metric spaces. A 
mapping <p : Ei — > E2 is said to be Hadamard differentiable at 9 tangentially to 
Eq C Ei if there is a continuous mapping (j)'g(h) : Ei — » E2 such that 

0(9 + t n h n ) - 0(9) 
7 ► PeW 

for all sequences t n — > and h n —> h, where h£ Eq. 

Theorem 2.1 (Functional delta-method, c.f. Ivan der Vaart and Wellnerl (|l996h . 
Theorem 3.9.4). Let <j) : Ei — ► E2 6e Hadamard differentiable at 9 tangentially to 
Eq C Ei. Lei {X n }™ =1 fee a sequence of random variables taking values in Ei. 
Suppose r n (X n - 9) A I £ £ /or some sequence of constants r n — ► 00. TTien 
r„(#X n )-#0))A#(X). 

For any cadlag function i* 1 : R — - > R, define the inverse map </> p by 

p (F) = F^(p) = inf{ u : F(u) > p}, pe (0, 1). 

The following result shows that the functional delta-method implies the convergence 
of quantiles. 



Proposition 2.2 (c.f. Lemma 3.9.23 and Example 3.9.24 in lvan der Vaart and Wellner 
( 19961 )). Let {Xi}°^ 1 be a sequence of independent and identically distributed ran- 
dom variables with d.f F. Suppose F has a continuous density f > with respect 
to the Lebesgue measure on the interval [F^(p) — e, F*~(q) + e], for < p < q < 1 
and e > 0. Then 

\(F^ - F*~) A j^p^, in l°°\p,q], 
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where the right-hand side refers to the random function u i 



/(F-(u))- 



3. Empirical processes and importance sampling 

The empirical measure resulting from a random sample of an importance sam- 
pling algorithm with sampling distribution v can be used to approximate important 
parts of the original distribution. In our context v is chosen to give a good approx- 
imation of the extreme tail of the original distribution. 

Let {Xi]°^ l be independent identically distributed with distribution v. The 
empirical measure with likelihood ratio weights and the corresponding tail empirical 
distribution are written 



N , N 

1 



i=i i=i 
F v>N {t) = unAH- > *})■ (3- 1 ) 



Let 5" a be the collection of indicator functions /{■ > t} with t > a. For the 
importance sampling estimators we are concerned with a central limit theorem of 
the form 



which we write, with slight abuse of notation, as 

\/N(F^ N -F)-^Z, in l°° [a, oc 



Note that /ijv>(/) = v^ivif) and //(/) = v(wf), where w — dfi/dv. There- 
fore the central limit theorem can be stated by saying that the collection wJ n = 
\wf : f € Jqj is i^-Donsker . By t he permanence properties of Donsker classes (see 
van der Vaart and Wellnerl ( 19961 ) Section 2.10) this follows when 1 a is ^-Donsker 



and E v w{X) 2 I{X > a} < oo. 

To identify the limiting process Z we first need to calculate the covariance func- 
tion of the process F v ^. 

Lemma 3.1. Let {Xi}°^ 1 be independent and identically distributed with distribu- 
tion v with jU <C v and w = d^i/dv. If Ew(Xi) 2 1{X\ > a} < oo, for some a > — oo, 
then, for y > x > a, 



g{x, y) := N Cov(F VtN (x),F Uitf (y)) = E v w(X 1 fl{X 1 > y} - F(x)F{y). (3.2) 
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Proof. This is a direct calculation. Indeed, 
Cov(F v , N (x),F VtN (y)) 

= E v (F v , N {x)F v>N {y)) - E v {F VjN {x))E v {F v , N {y)) 

= p^£„KX i KX j )7{I i > x}I{Xi > y})-F(x)F(y) 

= ^E,{w{X 1 ) 2 I{X 1 > x}I{X 1 > y}) 

+ ^2 Y. E MX)w{X j )I{X l > x}I{X 3 > y})-F(x)F(y) 

= ^E v (w(X l ) 2 I{X 1 > y}) 

N 2 - N — — 

+ - 1 ^—E v {w{X i )w{X j )I{X i > x}I{X ] > y}) - F(x)F(y) 

= ^E u (w(X 1 ) 2 I{X 1 > y}) + ^-^F(x)F(y) - F(x)F(y) 
= IjEvMXtfnX! > y}) - ±F(x)F(y). 



□ 



Note that if w = 1, i.e. the sampling measure is the original measure, then 
the covariance function becomes F(y) — F(y)F(x) — F(y)(l — F(x)) = F(x)(l — 
F{y)), y > x, which corresponds to a Brownian bridge evaluated at F. 

Now we are ready to state the central limit theorem for the tail empirical distri- 
bution of an importance sampling algorithm. 

Proposition 3.2. Let Z be a centered Gaussian process with covariance function 
g given by (|3.2p . If E l/ w(Xi) 2 I{Xi > a} < oo for some a > — oo, then 



N{F V>N -F)^> Z, inl°°[a,oo}. 

Proof. We have already seen that E v w(Xi) 2 I{Xi > a} < oo implies that 5" a 
is ;/-Donsker. Hence, we need only to identify the limiting process Z. Denote 
$,n(x) = Vn(F v .n(x) — F(x)). By the multivariate central limit theorem the finite 
dimensional distributions converge; for any x\,.. . ,Xk with Xi > a, 

(tN(xi),...,Z N (x k )) A JV(0,E), 

where the entries of £y = g(xi, Xj). This determines that the limiting process must 
be Z. □ 

We proceed with the asymptotic normality of the quantile transform. The proof 
is very similar to that of Proposition 12 . 21 and therefore omitted. 

Proposition 3.3. Let Z be a centered Gaussian process with covariance function 
g in (|3.2|) . Suppose F has a continuous density f > with respect to the Lebesgue 
measure on the interval \F*~{jp) — e,F^(q) + e], for < p < q < 1 and e > 0. 
EvwiXxfliXx > F^{p) - e} < oo, then 

■V( ! I - T v<N r - F-) A ln l°°\p, q}. 
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Next consider a central limit theorem for an importance sampling estimate of 
expected shortfall. For a non-decreasing cadlag function and < p < 1 we use 
the notation 



1 



1 ~P Jp 



F^(u)du. 



Recall that expected shortfall at level p for a random variable X with d.f. F is given 
by 7 p (i r<— ) and the importance sampling estimate based on a sample Xi, . . . , Xpj 
with sampling distribution v is given by 7 P ((1 — -FV.jv)* - )- 

Proposition 3.4. Assume the hypotheses of Proposition [XM // ; in addition, 
Jf'-M If^(p) e( x i v )dxdy < oo and g(x,x) = o([/(x)/F(:r)] 2 ), as a; — > oo, i/ien 



^(7 P ((l--F^r)-7p(^)) 



1 



1 Z(F^(u)) 



as N — > oo . 

Proof. Let <? and e be arbitrary with p < g < 1 and e > 0. Since 



P 



(|^M(i - f^d - >(^)] - f f jp§^ 



> e 



< P 



l-p 



{l-F v , N )^(u) - / F^{u)du 



PlVN 
1 



P 



l-p 

1 Z(F^(u)) 



9 Z(F^(u)) 
(l-F UjN )^(u)-F^(u)du >e/3 



du 



1-pJa f(F^(u)) 



da 



>e/Z) 



> e/3) (3.3) 
(3.4) 
(3.5) 



it is sufficient to show that each of the three terms converges to 0, as first N — ► oo 
and then q — > 1. 

Consider first (|3.3p . Let 7 Pi9 be the map defined by 



u)du, 



on the set D 7 of all non-decreasing cadlag functions H. Since 7 Pj9 is linear it is 
Hadamard differentiable on _D 7 with derivative 7^ (h) = Jp, q (h). In particular, 
Proposition 13. 31 and the delta- method imply that 



fNhp, q ((±-F».Nr)-j P , q (Fn) 



1 



q Z(F~(u)) 
1-PJ V f(F^(u)) 



du. 
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This takes care of (|3-3[) . Next consider (13. 5|) . By Chebyshcv's inequality 
1 f 1 Z(F^(u)) 



P 



< 



< 



1-Pj q f(F^(u)) 
3(1 * 



£ 




3(1- 


P) 


e 




3(1- 


P) 



Var 



>e/3) 
1 Z(F-(u)) 



V e 



f{F^{u)) 

/(i?-(u))/(F-(i;)) 
g(ir, y)dxdy. 

F~(q) JF^(q) 



dudv 



q J q 

OO />OG 



Since the integral is finite, this converges to as q — > 1. 
It remains to consider (13.41). First, write 



1-p 



< 



1-p 



+ 



(l-F„, N )-(q) 



< 



1-p 



1-p 



(l-F^rw-^cw)^ 

F v<N (x)dx 
F v ,n{x) — F(x)dx 



F{x)dx 



F-(g) 



JV(l-g) 



F-(g) 



(3.6) 



1-p 



1-p 



(l-F„ lJV )^(g) 
(l-F^^g) 



F u , N (x)dx I{{1 - F v , N )^{q) < F^(q)} (3.7) 



iV(l-g) 



1-p 



F v , N (x)dx - F VtN )<-(q) > F^(q)} (3.8) 

(3.9) 



(i-^»r(«)-^(!) 

First consider (J3TBJ) . By Proposition [XU and the delta method 



lim P 



< 



1-p 
P 



F v , N (x) - F{x)dx > e/12 



F-(g) 



1 



1-p 

12 N 2 



e(l-p) 

12 > 
e(l-p) 



F-l(q) 

Var 



^(x)dx >e/12; 
Z(x)dx^j 
q(x, y)dxdy. 



F"(q) 

OO pOO 



F-(g) JF-(g) 



Since F^(q) — > oo as q — > 1 and the integral is finite the expression in the last 
display can be made arbitrarily small. Next, consider (|3.7p . This term is bounded 
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from above by 



\/N — — — 

-F„, N U1 - F^ N r(q) ) ( F^(q) - (1 - F u , N r(q) 



1-p 



1-9 
1-p 



N(F^(q)-(l-F v , N r(q)), 



where we have used that F yj jv((l — F Vj N) < ~(q)) > 1 — q. By Proposition 
1-5 
P 



l im P (L^^ lF ^ {q) _ (i _ F„ lJV )^(«)| > e/12) 



= P 



1-9 



Z(i^(g)) 



< 



1-P 
12 n - 

(1 ~ P)e 
12 - ^ 



f(F-(q)) 



> e/12 



e (F-(g),J^(g)) 



F(F^(q)) 



f(F^(q)) 2 
2 e(F^(q),F-(q)) 

f(F^(q)) 2 ' 



This converges to as q — > 1 since x) — o([f(x)/F(x)} 2 ). 
Similarly, (|3.8p is bounded from above by 



^^ JV (F-(g))((l - F u>N )^(q) - F^(qj) . 



This can be treated just like the previous term since, by Proposition 13.3 

■F v<N (F^(q)) 



lim p(£*»i 

JV->oo \ 1 



P 



N\(l-F VtN r(q)-F^(q)\>e/U 



= P 



1-9 



1-p 



Z(F^(q)) 



> e/12 



f(F^(q)) 

Finally, (|3 ,9[) can be treated the same way since, by Proposition 13.3 
T(l - q) 



lim P 

N—*oo 
= P 



1-p 



(l-F^ N r(q)-F^(q)\ >e/12) 



l-q 



.1-p 

This completes the proof. 



Z(F^(q)) 



f(F^(q)) 



> e/12 



□ 



4. Efficient calculation of risk measures in the heavy- tailed setting 

In the previous section we established central limit theorems for importance 
sampling estimates of Value-at-Risk (i.e. quantiles) and expected shortfall. In this 
section we study the limiting variance of the central limit theorems as a function 
of p when p is close to 1. The main requirement is that the asymptotic standard 
deviation coming from the central limit theorem is roughly of the same size as the 
quantity we are trying to compute, when p is close to 1. We only consider the 
case when the original distribution is heavy-tailed, in the sense that F is regularly 
varying. 

There are three main assumptions in this section. 
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• We assume that the original distribution of interest has a regularly varying 
tail. That is, there exists a > such that 

lim2^ = aT a , x>0. (4.1) 
p(t) 

• We assume that there is an available explicit asymptotic approximation 
for F(x). More precisely, we know a non-increasing function U such that 
f7~F, i.e. 

Um = 1. (4.2) 

• We assume that we can construct sampling measures v\ with bounded 
relative error for computing rare event probabilities of the type F(\) — 
P(X > A); i.e. 

,. E Vx w x (X 1 fl{X 1 > A} 

hmsup = < oo. 4.3) 

A^oo F(A) 2 1 ' 

4.1. Computation of quantiles — Value-at-Risk. For a random variable X 
with d.f. F the Value-at-Risk at level p is defined as the pth quantile; VaR p (X) = 
F*~(jp). Given p € (0,1) close to 1, the importance sampling estimate based on 
independent and identically distributed samples with sampling distribution v is 
given by (1 — F v ^y~{p). Then Proposition 13.31 and Lemma 13.11 determines the 
asymptotic variance as 

/ Z(F-(p)K = g(F-(p),F-(p)) 
yf(F-( P ))J f{F^{p)f 

_ EMXi) 2 I{X l > F^(p )}-F(F^(p)) 2 

2 



f(F-{p)) 
EMXifl{X l > F^(p)} - {l-pf 



= ol. 



f(F^{p)f ■ f 

To control the asymptotic variance it seems like a good choice to use an efficient 
rare event simulation algorithm designed for efficient computation of P(X > F^{p). 
That is, with sampling distribution vp— r p \ . This is of course impossible since F^ (p) 
is unknown. However, the asymptotic approximation U may be helpful. Note that 
since U is monotone it has an inverse and by regular variation (1 — U)*~ ~ F^ 
as p — > 1. Thus, it seems reasonable to use u u where u p = (1 — U)*~(p). This is 
justified by the next result. 



Proposition 4.1. Suppose (|4. 1 [) - {|4. 3|) hold. If there exists cq < 1 such that 
r E^wxjX^IjX^coX} 

hmsup =— < oo, (4.4) 



then the sampling measures v u satisfy 



r 2 



limsup J—-7T < oo. 



Proof. First note that (14. A\ implies that 



E Vx w x {X 1 ) 2 I{X 1 > cA} 
limsup = < oo, 

A-co F(A) 2 
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for any c > c . Take e € (0, 1 — c ). Then there exists po such that f " > 1 — e, 
for each p > p . In particular, 

E v w^XtfliX! >F-(p)} 



lim sup 

P— i ^(wp) 



2 



i ( ;„ !) (X 1 ) 2 7{X a > (1 - e)u p } 

< hmsup — — < oo (4.5) 

p— l "( u p) 

Since w p ~ F^(j>) and xfix) ~ <xF(:r), by Karamata's theorem, it follows that 

l im ^(P)) 2 =um ^(P)) 2 = JL (46) 

hm = lim 7(F-( P )) 2 = J_ 

P -ir(p) 2 /(r(p)) 2 p-i F^(p) 2 f(F^(p)) 2 a 2 V ' 

By (|43)) , (|4^|) . and (|4~T)) it follows that 

^ U ; Up (X 1 ) 2 7{X 1 >F-(p)}-(l- P ) 2 

] T-!i p = p F^WK^m 

E^w^jX,) 2 !^ > F^(P)} F{U^(p)) 2 
< oo. 



F^{p) 2 f{F^{p)f 

□ 

Under somewhat stronger assumptions it is possible to reach a more explicit 

2 

asymptotic bound for F ^( p yz ■ 



Proposition 4.2. Suppose (|4.ip and (|4. 2[) fcoZd. Suppose additionally that there 
exist Cq < 1 and a function ip, continuous at 1, such that, for c > c , 

A^oo /( (A)- 2 

TTien </ie sampling measures v Up satisfy 

p-vi F^(p) 2 ~ a 2 

Remark 4.3. If the asymptotic quantile approximation based on [/ always under- 
estimates the true quantile, i.e. u p < F*~(p) for each p, then one can take cq = 1 
in Proposition 14.11 and Proposition 14.21 

Proof. First note that (1 — f7)*~~ ~ F* - . Take e e (0, 1 — Co). Then there exists po 
such that F u ^ > 1 — e, for each p> po- In particular, 

w Up {X 1 ) 2 I{X 1 >F^{p)} 
hmsup =- — < tp(l - s). 
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Since e > is arbitrary and ip continuous at 1 it is possible to let e — > and get 
the upper bound 

E„ w {X 1 )*I{X 1 >F~(p)} 

hmsup =- — < (p{l). 

P-i F (u p ) 



Then, by (jUj), and (gjj) it follows that 

E Vu w Uv {X l fl{X 1 >F^{p)}-{l-pf 

T-" p = h " s i p F^mw^m 

E^w^X.fljX, > F~(p)} F{U^(p)f 

T-T f(u-(p)) 2 F^{pyf{F<-{p)Y 

(1-P) 2 

F^(p) 2 f(F^(p))* «2 • 

□ 



4.2. Expected Shortfall. Next we consider the properties, whenp is close to 1, of 
the asymptotic variance in the central limit theorem, Proposition ^. 4[ for expected 
shortfall. 

Proposition 4.4. Let a > 2. Suppose (|4.1|) and (|4.2p hold. Suppose additionally 
that there exist cq < 1 and a non-increasing function cp, regularly varying with index 
—a, such that, for c > Cq, 

r E Vx w x {X 1 fl{X 1 >cX} 

hmsup =— < tp{c). (4.9) 

A^oo r (A) 

Then the sampling measures u u satisfy 

lim sup ; 7= — < oo . 

P-i lp(F^) 2 
Moreover, if (|4.9p fto/(is tozi/i /i(c) = Kc~ a , for some constant K € (0, oo), then 

Var (i^ £ 7^MT du ) . 1 /2A'(a - 1) 

lim SUp rr; < — ^ 

p-i 7 P (^) 2 "a 2 V a- 2 



Proof. By Lemma |3. 11 



= £„ Up (w Up {X 1 fl{X l > F- (u)}7{X! > («)}) - ( U ))F(F^ («)) 
iip (^(J^) 2 /^ > F^u)}/^ > F^(«)}) - (1 - u)(l - «), 
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which implies, by Proposition 13.21 that 
Var i 



dudv 



( i f 1 *('■" (</)).;„ 
\l-pJp f(F^( q )) a< l 

1 

(l-p)2 7p (F-)2 
1 

(l-p)2 7p (F-)2 
1 

(l-p)2 7p (F-)2 



fi /.i 



S(Z(F-( M ))Z(F-(„))) 
/(F-(u))/(F-(«)) 
p(F-( U ),F-( V )) 



p J p 



/(F-(u))/(F-(«)) 



dudv 



p J v 

1 



/(F-(u))/(F-(v)) 
! ' ! (l-u)(l-«) 



(l-p) 2 7 ? r) 2 i P 7 P /(F-(u))/(F-(«)) 



dudv. 



dudv (4.10) 
(4.11) 



Consider first P~TTj) . By Karamata's theorem f(F^(u)) ~ j^^F^^u)) 
< F'-~(u) ' as u ~~ * ^ ' an< ^ therefore 

n 1 ^ f 1 rrtj^Kx du f 1 a~ 1 F^(u)du 



7p(^) p- 1 /7-(tt)dt 



This yields, 



1 r 1 rl 



lim 

p->i 



r dudv 



a 



Next rewrite (|4.10|) for v > u as 
2 



(l-p) a 7p(F-)* 



i E, 



»u p (w Up (X 1 ) 2 I{X 1 >F^(v)}) 



Then, the inner integrand can be written 
E^ p (w Up (X 1 )*I{X 1 >F^{v)} 



f(F^(v)) 



dv 



du 



f(F-(v)) 

Eu Up {w Up (X 1 ) 2 I{X 1 > ^g^M P }) p^f F(F-(p)) 2 

F(F-(p)) 2 /(F-(u)) 



< 



F( Up ) 2 

•F-(«)N (1-p) 2 



p U(f-(v)Y 



(4.12) 



where we have used (|4.9p . By Potter's bound there exists, for each e > 0, a 
constant C £ such that (p{F^(v)/u p ) < C £ {F^ {v)/u p )- a+e . Take < e < 2 - a. 
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The asymptotics of the integral in (14.10|) can now be determined as 



p /(**"(«)) 



i E (w^iX.fliX, >F^(v)} 



dv 



du 



C £ (l-P) 2 



1 F^(v)- a+e 



dv 



''dydu 



-dx 

F^{p) ot — £ — 1 

(a-e-l)(a-e-2) 



(a-£-l)(a-£-2)' 



By Karamata's theorem 



1 



F^{u)du 



1 



1-p 

T~ ^j-i^(p)F(F-(p)) 



xf(x)dx 



-(p) 



a 



a — 1 



F^(p). 



Putting everything together, the expression in (|4.10j) is asymptotically bounded. 

Moreover, if (|4.9p holds with y>(c) = Kc~ a , for some constant if £ (0,oo), then 
it is possible to take C £ = K and e = 0. This results in 



Var 



lim sup ■ 

p-i 



p_ r 1 



/(F-(«)) aM 



IviF-f 



1 /2K(a- 1) 



< -rl — ^— — '- - 1 

a- 2 



□ 



5. Examples and numerical illustrations 

In this section we use the methods presented in the previous sections to design 
efficient algorithms for computing Value-at-Risk and expected shortfall of a random 
variable X which is the value at time n > 1 of a heavy-tailed random walk. More 
precisely, 

n 

X = Y J Z l , (5.1) 

i=l 

where Zi, i = 1, . . . ,n, are i.i.d. and regularly varying with tail index a. We will 
use Fx and Fz to denote the d.f. of X and Z\, respectively. We write fx and f z 
for the corresponding densities. First we need to establish that the assumptions in 
the beginning of Section 4 are satisfied. 

The subexponential property implies that the tail the random variable X satis- 
fies Fx{x) ~ nFz{x), as x — > oo. Hence, Fx is regularly varying with index —a 
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and the function U can be taken to be nF z- Finally, we need to consider impor- 
tance sampling algorithms with bounded relative error for computing rare event 
probabilities of the form P(X > A). 

There exist several importance sampling algorithms for efficient computation of 
rare event probabili ties of this form. Here we consider the dynamic mixture algo- 
rithms described in iHult and Svenssonl ((20091 ) to generate N independent samples 
of X, d enoted X-\ , . . . , Xn- Id particular we consider the conditi onal mixture algo- 
rithm of Dupuis et al. ( 2007 ) and the scaling mixture algorithm of lHult and Svensson 
(|2009h . Then, the tail e.d.f. F Uu ,jv is constructed from the sample. Value-at-Risk 
is computed as (1 — F v n)*~(p) and expected shortfall as 7 P ((1 — F Vu N )^). 

In the next subsection we verify the conditions of Proposition 13.41 Proposition 
4.21 and Proposition ^. 4l for these algorithms. Then the algorithms are implemented 
and their numerical performance is illustrated when Z\ has a Pareto distribution. 



5.1. Dynamic mixture algorithms. The dynamic mixture algorithm is designed 
for generating samples of X in (15. ip in order to efficiently compute rare event 
probabilities of the form P(X > A). Here it is convenient to use the notation 
Si = Z\ + • • • + Zi, i > 1, So = 0, and with this notation X — S n is the variable 
of interest. Each sample of S n is generated sequentially by sampling Zi from a 
mixture where the distribution of Zi may depend on the current state, In the 

ith step, i = 1, . . . ,n — 1, where Si-i = Sj_i, Zi is sampled as follows. 

• If Si-i > A, Zi is sampled from the original density fz, 

• if Sj_i < A, Zi is sampled from 

Pifz(-) + qi9i(- I Sj_i), for 1 < i < n - 1, 
9n{- I s„_i), for i = n, 

where gi{- \ Sj_i) is a state dependent density. Here pi + qi = 1 and 
Pi G (0,1). 

The sampling measure distribution of S n obtained by the dynamic mixture algo- 
rithm for computing P(S n > A) is, throughout this section, denoted v\. 

The following results provide sufficient conditions for the upper bound <f(c) that 
appears in Proposition ^. 21 and Proposition 14.41 

Lemma 5.1. Consider the mixture algorithm above with pi > for 1 < i < n — 1. 
Suppose there exist a G (0, 1) and c > such that 

liminf inf 9l { ^ j ^ F z (A)>0, 1 < i < n, (5.2) 

a^oo „ < c( i _ (i _ )» 5 Jz(Ay) 

B > a(c - s) 

r /z(Ay) , . 

hmsup sup — — — < oo. (5-oj 

A^oo s < c ffn(Ay I As) 

y > c — s 

Then the scaled Radon- Nikodym derivative — 1 Z^(^u) * s bounded on {yi + • • • + 
Vn > c}. 



Th e proof is essentially identical to the proof of Lemma 3.1 in 
(l2009h and therefore omitted. 



Hult and Svensson 



16 



H. HULT AND J. SVENSSON 



Theorem 5.2. Suppose (|5.2|) and (|5.3| hold for a G (0,1). Suppose, in addition, 
that there exist continuous functions hi : R™ — ► [0, oo) and a constant cq > such 
that 

JziX " !) ■ hiiVi I s^), (5.4) 



g l (Xy l | As,_i)-F(A) 

e R™ 

Then, for c > cq 



uniformly on {y G R" : Sj_i < c(l — (1 — a) 1 > a(c — /or any c > cq. 



Inn sup < > I I / hi{yi | 0)a Vi dy u (5.5) 

A^oo £ x(A) i=1 j =1 Pj Qi J c 



with q n = 1. 



Th e proof is essentially identical to the proof of Theorem 3.2 in lHult and Svensson 
(2009) and therefore omitted. 

By Theorem l5.2l we see that the function ip in Proposition ^. 2l can be taken as the 
right-hand side of ()5.5j) . Moreover, by Karamata's theorem, it is regularly varying 
with index —a if hi(jji \ 0) is slowly varying. 

Finally, we establish that the conditions on the covariance function g in Propo- 
sition [33] are satisfied. 



Lemma 5.3. Let X have distribution fi, d.f. Fx and density fx- Suppose Fx is 
regularly varying with index —a, with a > 2. Let v denote any sampling distribu- 
tion. If d\xjdv is bounded on (a, 00) then the covariance function g in (|3.2[) satisfies 
XT ST 9{x,y)dxdy < 00 and g(x,x) = o([f x (x)/F x(a;)] 2 )- 

Proof. First note that if d/i/dv < K for some constant K G (0, 00) then 
g(x,y) < KF x (y) - F x {x)F x (y), for y > x. 

Then 



/ / Q{x, y)dxdy = 2 / 

J a J a J a J x 

poo 

<2K I F x (y)dydx~2[ I F x {y)dy 



00 poo 



g(x 1 y)dydx 

2 



The first integral is finite, by Karamata's theorem, since a > 2 and the second 
integral is finite for a > 1 and then also for a > 2. For the second condition 

g(x,x) <K J x (x) 3 F x {x) A 



fx{xY/F x {xY ~ fx{x) 2 fx(xf • 

By Karamata's theorem aFx (x) ~ xfx (x) so the expression in the last display is 
asymptotically equivalent to 

Kx 3 fx(x)-x 4 f x (x). 

This converges to as x — > 00 when a > 2 since fx is regularly varying with index 
—a — 1. This completes the proof. □ 
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5.2. Conditional mixture algorithms. The conditional mixture algorithm by 
Dupuis et al.1 (|2007l ) can be treated with the above results. 
The conditional mixture algorithm has, with a £ (0, 1), 



/ i s f z (x)I{x > a(b- s)} 
9i{x I s) = =- 



g n (x | s) 



F z (a(b-s)) 
fz{x)I{x >b-s} 
F z {b-s) 



Ki<n-\, 



Then the techniques for establishing the conditions of Lemma |5TT| and Theorem|5.2[ 
with c p = 1, are completely similar to the ones in Section 4.1 in Hul t and Svensson 
(2009). The upper bound i n The orem 15.21 holds where the functions hi are given 
by (see Hult and Svensson ( 20091 )) 



hi(y | s) = lim 



fz(by) 



^ fz{Xy)/F z (aX(l - s))F z {\)I{y > o(l - s)} 
a- a (l-s)- a , i = l,...n-l 



and 



h n (y I s ) = I™ 



fz(Xy) 



a— f z (Xy)/Fz(X(l - s))F z {X)I{y > (1 - s)} 
The resulting upper bound ip(c) in Propositions 14.21 and [4~4l is given by 

n—l i— 1 ^ ^ n— 1 

<p(c) = c 



(l-s)-*. 



where a € (0, 1). 



5.3. Scaling mixture algorithms. In the scaling mixture algorithm the large 
variables are generated by sampling from the original density and multiplying with 
a large number. In the context of scaling mixtures we assume that the orginal 



density fz is strictly positive on (0, oo). We also assume that fz(x) 



l L(x) 



with L slowly varying and inf x >x L(x) =: L* > for some xo > 0. The scaling 
mixture algorithm, with a > 0, has 

9l {x | s) = (aX)- 1 f z (x/c7X)I{x > 0} + f z {x)I{x < 0}, i = 1, . . . , n - 1, 
g n (x | s) - (aX^fzix/aX^ix > 0, s < X - A(l - a)"" 1 } 
+ fz(x)I{x < or s > X - A(l - a)™" 1 }. 

To generate a sample Z from gi proceed as follows. Generate a candidate Z' from 
fz- If Z' < put Z = Z' and 'if Z' > 0, put Z = rrAZ'. 

For the scaling mixture algorithm the conditions of Lemma 15.11 and Theorem 
15.21 can be established with Co < 1. The techniques for doing this are completely 
similar to the ones in Section 4.3 in Hul t and Svenssonl ( 2009t ). The upper bound 



i n Th eorem 15.21 holds where the functions hi are given by (see iHult and Svensson 
(|2009h 



h l (y l \ St - 1 ) = a \[y? +1 f(yJX)]- 1 . 



18 



H. HULT AND J. SVENSSON 



and the resulting upper bound ip(c) in Propositions 14.21 and 14741 is given by 

~ X dyi. 



<p(c) 



Q' 



\ a L( Vi /X) 



5.4. Numerical computation of Value-at-risk. We now consider a sum of n 
Pareto-distributed random variables, 



S n — Z\ 



We will estimate quantiles of S n by using the importance sam pling e.d.f. given 
by the scaling mixture algorithm in Hult and Svensso 3 (|2009h (SM) as well as 
the conditional mixture algorithm in lDupuis et alJ ( 2007h (DLW). The changes of 
measure are chosen by using the asymptotic approximation of the quantiles, 

This approximation is based on the subexponential property, and since P(S n > 
x) > nP{X\ > x) for positive random variables, it is smaller than the true quantilc. 

For p equal to 0.99, 0.999 and 0.99999, we use the DLW algorithm 10 2 times with 
N = 5 • 10 4 samples to obtain a reference value which we refer to as the true value 
of the quantile. 

We compare the performance of the quantile estimates based on N = 10 4 sam- 
ples. The estimation is repeated 100 times and the mean and standard deviation 
of the estimates are reported. 

We also include the results from standard Monte Carlo for comparison. 

Table 1. Simulations of X p such that P(S n > X p ) — 1 — p, where 
S n = ^ an( l > x) = (1 + x)~ 2 . The number of sam- 

ples used for each estimate was N = 10 4 and the estimation was 
repeated 100 times. 



n 


l-p 


True 


Approx. 


SM 


DLW 


MC 




10 


le-2 


40.141 


30.623 


41.007 
(0.246) 


40.166 
(0.459) 


40.038 
(1.780) 


Avg. est. 
(Std. dev.) 




le-3 


108.49 


99.000 


109.33 
(0.847) 


108.29 
(1.081) 


84.821 
(47.23) 






le-5 


1007.4 


999.00 


1003.1 
(18.5) 


1007.5 

(1.51) 


609.42 
(1594) 




30 


le-2 


84.622 


53.772 


85.841 
(0.3950) 


84.681 
(1.237) 


84.362 
(2.739) 






le-3 


202.41 


172.21 


203.56 
(1.530) 


202.29 
(2.400) 


171.16 
(71.26) 






le-5 


1759.5 


1731.1 


1753.7 
(41.12) 


1759.0 
(1.487) 


114.23 
(443.5) 





5.5. Numerical computation of expected shortfall. Using the setting from 
the previous section, we also calculate the expected shortfall for the case of a random 
walk with Pareto-distributed increments. We first consider the case where a = 2, 
although it does not satisfy the conditions of Proposition 14.41 
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Table 2. Simulations of \ p such that P{S n > A p ) = 1 — p, where 
S n = Y]™—! Zi and P(Z\ > x) = (1 + a;) -3 . The number of sam- 
ples used for each estimate was N — 10 4 and the estimation was 
repeated 100 times. 



n 


l-p 


True 


Approx. 


SM 


DLW 


MC 




10 


lc-2 


14.190 


9.0000 


14.853 
(0.090) 


14.195 
(0.154) 


14.182 
(0.305) 


Avg. est. 
(Std. dev.) 




le-3 


25.656 


20.544 


26.125 
(0.171) 


25.588 
(0.412) 


24.965 
(2.212) 






le-5 


103.42 


99.000 


104.23 
(0.799) 


103.40 
(0.553) 


5.283 
(16.03) 




30 


le-2 


29.951 


13.422 


31.054 
(0.287) 


29.943 
(0.519) 


29.949 
(0.500) 






le-3 


46.072 


30.072 


46.725 
(0.286) 


46.277 
(1.041) 


44.608 
(2.688) 






le-5 


157.65 


143.22 


158.46 
(1.080) 


157.62 
(0.273) 


13.847 
(28.53) 





Table 3. Simulations of E(S n \S n > X p ), where P(S n > X p ) = 
1 - p, S n = £" =1 Z l and P(Z 1 > x) = (1 + x)- 2 . The number of 
samples used for each estimate was N — 10 4 and the estimation 
was repeated 100 times. 



n 


l-p 


True value 


SM 


DLW 


MC 




10 


lc-2 


71.795 


73.065 
(1.06) 
[0.845] 


71.831 
(1.22) 
[0.815] 


72.252 
(8.75) 
[0.702] 


Avg. est. 
(Std. dev.) 
[Avg. time (s)] 




le-3 


208.84 


209.37 
(3.60) 
[0.734] 


209.30 
(4.99) 
[0.724] 


213.42 
(65.8) 
[0.572] 






le-5 


2008.4 


2009.8 
(37.1) 
[0.866] 


2009.3 
(30.9) 
[0.822] 


4787.8 
(23168) 
[0.693] 




30 


le-2 


139.22 


140.55 
(2.22) 
[1.189] 


139.14 
(3.09) 
[1.077] 


140.76 
(17.34) 
[0.903] 






le-3 


376.29 


375.76 
(5.00) 
[1.033] 


378.24 
(11.49) 
[0.936] 


391.06 
(96.36) 
[0.757] 






le-5 


3494.4 


3500.0 
(65.2) 
[1.301] 


3496.9 
(59.8) 
[1.224] 


745.70 
(3671) 
[0.991] 
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Table 4. Simulations of E(S n \S n > X p ), where P(S n > X p ) = 
1-p, S n = ^r=i z i and p ( z i > x) = (1 + x)- 3 . The number of 
samples used for each estimate was N = 10 4 and the estimation 
was repeated 100 times. 



n 


1-p 


True value 


SM 


DLW 


MC 




10 


lc-2 


19.260 


20.044 
(0.167) 
[0.702] 


19.257 
(0.395) 
[0.727] 


19.495 
(0.905) 
[0.605] 


Ave 1 pst 
(Std. dev.) 
[Avg. time (s)] 




le-3 


36.658 


36.911 
(0.327) 
[0.7079] 


36.463 
(0.776) 
[0.731] 


41.032 
(5.53) 
[0.613] 






le-5 


154.74 


154.39 
(1.326) 
[0.708] 


153.83 
(2.705) 
[0.733] 


132.74 
(491.7) 
[0.607] 




30 


le-2 


37.277 


38.603 
(0.902) 
[0.885] 


37.200 
(1.169) 
[0.923] 


37.744 
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[0.707] 






le-3 


62.090 


62.013 
(0.416) 
[0.912] 


62.066 
(1.814) 
[0.939] 


69.369 
(7.973) 
[0.712] 






le-5 


232.01 


230.27 
(1.92) 
[0.911] 


230.00 
(1.47) 
[0.935] 


225.14 
(932) 
[0.703] 
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